Exploiting variety-dependent Phones in Portuguese Variety Identification

نویسندگان

  • Oscar Koller
  • Alberto Abad
  • Isabel Trancoso
چکیده

This paper presents a new approach of building a language identification system using a specialized Phone Recognition system followed by Language Modeling (PRLM) to differentiate Portuguese varieties spoken in African Countries from European Portuguese. The system is designed to focus on exploiting the phonotactic information of a single discriminatively trained tokenizer for the specific pair of target varieties. In contrast to other PRLM-based methods, the single tokenizer already combines distinctive knowledge about the differences between both target varieties. This knowledge is introduced into a dedicated multiple-stream Multi-Layer Perceptron (MLP) phone recognizer by training mono-phoneme models for two varieties as contrasting phoneme-like classes within a single tokenizer. Significant improvements in terms of identification rate and computational cost were achieved compared to a conventional single tokenizer PRLM-based systems and to the combination of up to five parallel PRLM identifiers. The method is also applied to other varieties of Portuguese yielding similar results. Variety identification; Portuguese varieties

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription

This paper presents a Variety IDentification (VID) approach and its application to broadcast news transcription for Portuguese. The phonotactic VID system, based on Phone Recognition and Language Modelling, focuses on a single tokenizer that combines distinctive knowledge about differences between the target varieties. This knowledge is introduced into a Multi-Layer Perceptron phone recognizer ...

متن کامل

Transcription of Multi-variety Portuguese Media Contents

Current automatic transcription technology applied to media contents is an important medium that not only allows generating subtitles, but also enables data search and retrieval capabilities over multimedia streams. Among others, one of the most important challenges that transcription systems have to deal with is speaker accent variability. In this work we study the importance of accent variabi...

متن کامل

Automatic Speech Recognition and Identification of African Portuguese

This document deals with speech recognition of different Portuguese varieties, it resumes results from the author’s diploma thesis [9]. The performance of a hybrid large vocabulary continuous speech recognizer, which combines multi-layer perceptrons and Hidden Markov Models, degrades heavily in the presence of African Portuguese varieties in broadcast news. Variety-specific acoustic and languag...

متن کامل

Language and variety verification on broadcast news for Portuguese

This paper describes a language/accent verification system for Portuguese, that explores different type of properties: acoustic, phonotactic and prosodic. The two-stage system is designed to be used as a pre-processing module for the Portuguese Automatic Speech Recognition (ASR) system developed at INESC-ID. As the ASR system is applied everyday to transcribe the evening news from a Portuguese ...

متن کامل

Proposing a Model for Patient Admission and NFC Mobile Payment by Biometric Identification and Smart Health Card

Abstract Following the advances in mobile communication and information technology, smart phones have been used in a wide variety of commercial, social, entertainment, file sharing and health transactions and applications. The current procedures in healthcare environment for patient registration, appointment scheduling and payment are time consuming and somehow tiresome. Traditionally, patie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010